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ABSTRACT 

The nucleotide sequence of the RNA encoding the nucleocapsid protein of 
coronavirus MHV-A59 has been determined. Copy DNA was prepared from mRNA 
isolated from virally infected cells, fragmented and cloned in the phage 
vector Ml 3 mp8 for direct sequence determination. A sequence of 1817 nu¬ 
cleotides, adjacent to the viral poly-A tail, was obtained. It contains 
a single long open reading frame encoding a protein of mol. wt. 49660, 
which is enriched in basic residues. 

IHTRODPCTION 

The coronaviruses comprise a large group of enveloped RNA viruses isolated 
from a range of animal hosts (for review see 1). They have been associated 
with a variety of respiratory and gastro-intestinal infections and neurolo¬ 
gical disorders, and may also provide a model for the study of persistent 
viral infection. 

The coronavirus strain A59, a mouse hepatitis virus, can be propagated in 
cell culture, and its molecular biology has been studied in some detail. The 
virion contains a single positive-stranded RNA, 18kB in length, associated 
with a nucleocapsid protein (2). The viral lipid envelope contains two gly¬ 
coproteins: El, of mol. wt. 24,000, occurring in unglycosylated as well as 
well as O-glycosylated forms (3,4,5), and E2, (mol. wt. 90,000/180,000), 
which forms the surface projections or peplomers characteristic of the coro¬ 
na virion (2,4). 

Two features of the life cycle of coronaviruses are of particular interest at 
the molecular level. First, the viral mRNA's produced during infection form 
a nested set, corresponding to the 3' end of the virion RNA but extending 
to different lengths towards the 5' end: the largest is identical to the vi¬ 
rion RNA (6-12). From each of the RNA'b (seven in the case of MHV-A59), only 
the 5' gene is translated (7,13,14). Thus, the coronaviruses have a replica¬ 
tion strategy which differs from any so far reported for RNA viruses. Secon- 
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dly, the virus buds intracellularly, in endoplasmic reticulum and perhaps 
Golgi membranes (15,16). The factors which specify this site of assembly, 
rather than the plasma membrane, are at present unknown. 

Here we report a nucleotide sequence of cloned copy DNA prepared from MHV-A59 
mRNA. A sequence of 1817 nucleotides, ending in the polyadenylation site, 
has been determined. Translation of the sequence predicts a polypeptide whose 
size, general features and genetic location are consistent with its being 
the viral nucleocapsid protein. 

MATERIALS AND METHODS 
cDNA preparation 

Total poly-A + RNA from MHV-A59 infected Sac cells was prepared aB described 

(17). First-strand cDNA was synthesized in a mixture containing RNA (50pg), 

Tris-Cl pH 8.3 (50nM), KC1 (50mM), MgCl2 (8mM), dithiothreitol (ImM), oligo- 

dT,,, ,o (' Cg. P-L. Biochemicals), sodium pyrophosphate (2mM), dATP, dGTP, 
12-18 

dTTP (each ImM), dCTP (0.5mM), a- P-dCTP (50p Ci: AmerBham) and AMV reverse 
transcriptase (350 units; kindly provided by Dr. J. Beard) in a total volume 
of 200 p 1 . The mixture was incubated for 30 min. at 41°C, then for 15 min 
at 45°C. EDTA was added to 20mM, the material extracted twice with phenol/ 
chloroform and the pooled aqueous phases extracted twice with ether. The 
products were precipitated with ethanol and redissolved in 50Pl 5mM Tris, 

ImM EDTA pH 7.5. The material was loaded on a 1Z low-melting-temperature 
agarose (BRL) gel, and electrophoresis carried out at 20v/cm for 60 min, to 
remove low molecular weight material. Regions of the gel containing cDNA were 
identified by autoradiography, cut into 1mm slices, and each slice melted, 
phenol-extracted and the cDNA precipitated with ethanol. RNA was hydrolysed 
by incubating the material from each slice in 0.2M KOH for 10 min at 65°C 
in a volume of 20 pi , and the mixture was neutralized with HC1. The cDNA 
was converted to the double-stranded form in a mixture containing HEPES/KOH, 
pH 6.9 (lOOmM), MgCl2 (4mM), dithiothreitol (0.5mM), dATP, dCTP, dGTP and 
dTTP (each ImM), KC1 (50mM), Klenow polymerase (20 units; Boehringer) in a 
volume of lOOpl . The mixture was incubated for 3 hrs at 17°C, and the DNA 
precipitated with ethanol. Yields obtained from each gel fraction were mea¬ 
sured by TCA precipitation of aliquots on filters, and scintillation counting. 
Fragmentation and cloning of cDNA 

Portions (30ng) of the double-stranded cDNA were cleaved with one of the res¬ 
triction enzymes Haelll (kindly supplied by V. Pirrotta), Fnu DII or Rsal 
(both New England Biolabs) under standard conditions. The DNA was purified 
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by phenol extraction and ethanol precipitation, and randomly ligated to lOng 
Ml3 mp8 (18) replicative-form DNA which had previously been linearized with 
Smal (New England Biolabs), in a mixture containing Tris-HCl, pH 7.5 (50mM), 
MgCl 2 (lOmM), dithiothreitol (ImM), ATP (0.2mM) and T4 ligase (6 units; kind¬ 
ly supplied by R. Brown) in a volume of 10pl. The ligations were incubated 
overnight at room temperature. 

Alternatively, cDNA was treated with nuclease SI. cDNA (80ng) was incubated 
in 60pl of a solution containing sodium acetate (30mM , pH 5.2), NaCl (0.3M), 

Zn CI 2 (2mM) and 10 units SI nuclease (BR1), for 3 min at 37°C and 10 min at 
room temperature. EDTA was added to 5mM, the mixture extracted twice with 
phenol/chloroform, washed with chloroform and ether, and the DNA precipitated 
with ethanol; approximately 50Z of the input radiolabel remained TCA—precipi- 
table. The DNA was then treated with Klenow polymerase in a lOyl volume con¬ 
taining Tris-Cl (lOmM, pH 7.5), MgCl 2 (lOmM), NaCl (50mM), dithiothreitol 
(0.5mM), all four deoxynucleotides (each 0.4mM) and Klenow polymerase (1 unit) 
for 15 min. at room temperature. EDTA was added to 20mM and the mixture phenol- 
extracted and ethanol-precipitated as before. The DNA was then ligated to 
Smal-digested Ml 3 mp8 exactly as for restricted cDNA. 

The products of the ligation reactions were then used to transfect competent 
E.coli cells, lac plaques picked and grown, the viral DNA isolated and used 
directly for single-nucleotide dideoxy-sequencing of the inserted DNA (all as 
in 19), without any prior screening. Clones of interest were sequenced, using 
a 15-base synthetic single-strand oligonucleotide (P.L. Biochemicals) as 
universal primer. Reaction products were analysed on 0.2mm thick thermostatted 
6Z acrylamide gels (20), 40cm or 60cm in length. Sequences were stored, and 
overlaps identified, by computer, using the program package of Staden (21). 

RESULTS AND DISCUSSION 
"Shotgun" cloning of viral cDNA 

A total of 48 recombinant cloned were sequenced; of these sequences, 33 could 
be assembled into a contiguous consensus of 1817 nucleotides, followed by a 
poly-A tract of variable length (Fig.l). The remaining 15 clones, none of 
which contained sequence overlaps with each other, presumably arise from 
cellular mRNA present in the starting material, or from regions towards the 
5' end of the viral genome. 

Several of the clones generated by Sl-nuclease digestion of the cDNA diverged 
at one end from the main consensus (Fig.l). With the possible exception of 
clone S9 (see below), these were probably due to short 3' extensions remai- 
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ning on some cDNA fragments after treatment with SI nuclease and Klenow poly¬ 
merase; these could promote simultaneous ligation of more than one fragment 
to a single vector molecule. Therefore sequences obtained from Sl-derived 
clones were used only to confirm regions of the consensus obtained from res- 
tricted-cDNA clones. 

Structure of the MHV-A59 RNA's 

A sequence of 1817 nucleotides, adjoining a poly-A tract, is shown in Fig.2. 
The viral origin and strand orientation of the sequence was confirmed by 
dot hybridization of radiolabelled, fragmented virion DNA to the single-strand 
recombinant DNA's (not shown). 

Comparison of the sequence with the analysis of RNAse-Tl oligonucleotides from 
various viral RNA's reported by Lai et al. (22) showed good, although not per¬ 
fect, agreement (Table I) between the two sets of data. Of the nine djgpnucleo- 
tides isolated from RNA7, seven have similar corresponding sequences in Fig.1. 
The two which were not identified, spots 10 and 19, probably correspond to 
two spots which are also not found in 3' fragments of the virion RNA (6). 

It has been proposed that these oligonucleotides reflect the existence of a 
short 5' sequence common to all seven viral RNA's, and that oligonucleotides 
19 and 19a cross the joining site between the leader sequence and the coding 
region in RNA's 7 and 6 respectively (8,6). Nucleotides 137-152 from Fig. 1 
show some similarity to both of these oligonucleotides, and also to spot 17, 
which is found in all the larger RNA's, but not RNA7 (see Table 1). If the 
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Figure 1. 

Arrangement of clones used to construct the nucleocapsid sequence. Arrows show 
the direction in which the sequence was determined (the complement of the Ml 3 
single-strand DNA). The 5' end of clone F9 is an anomalous cleavage; all other 
restricted-cDNA clones are bounded by correct restriction sites. Thfe sequence 
of clone FI5 was readable beyond 600 nucleotides with sufficient accuracy to 
confirm its overlap with clone H9. Diagonal lines represent regions of diver¬ 
gence from the consensus sequence. 
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CGCTTATAAGTGCAAAGGTGTGACACCTTAGCAACTGTAACATAAGCGGGCAAGTCAGAC 
10 20 30 AO 50 60 


AAAGAGAAACCGGTGCCTAGCTTAACAGCCITGCATGTAGAGGTGGCCACGAATAATAGT 


70 


80 

90 

100 


■ 110 


120 





h S F 

V 

P G 

Q 

E N 

GGCTGTTAGTG 

TATGGTAATCTAAACTTTAAGGATGTCTTTTGTTCCTGGGCAAGAAAAT 

130 


1 40 

150 

160 


170 


180 

A G G R 

S 

s s 

V N R A 

G N G 

1 

L K 

K 

T T 

GCCGGTGGCAG 

AAGCTCCTCTGTAAACCGCGCTGGTAATGGAATCCTCaAGaAGACCact 

190 


200 

210 

220 


2 30 


240 

W A D Q 

T 

E R 

G P N N 

0 N R 

G 

R R 

N 

0 P 

TGGGCTGACCAAACCGAGCGTGGACCAAATAATCAAAATAGAGGCA 

GA AG 

GAATCAGCCA 

250 


260 

270 

280 


2 90 


300 

KOTA 

T 

T 0 

P N S G 

S V V 

P 

H Y 

S 

H F 

AAGCAGACTGCAACTACTCAACCCAACTCCGGGAGTGTGGTT 

CCCC 

ATTA 

CTCC 

TGGTTT 

310 


320 

330 

340 


3 50 


360 

S G I T 

0 

F 0 

K G K £ 

F 0 F 

A 

E G 

0 

G V 

TCTGGCATTACCCAGT 

TCCAA 

AAGGGAAAGGAGTTTCAGTTT 

GC AG 

A AGG 

AC AA 

GGAGTG 

370 


380 

390 

400 


410 


420 

P I A N 

G 

I P 

A S £ 0 

K G Y 

H 

Y R 

H 

N A 

CCTATTGCCAATGGAA 

TCCCC 

GCTTCAGAGCAAAAGGGATAT 

TGGT 

ATAGACAC 

AACGCC 

A 30 


A 40 

450 

4 60 


4 70 


480 

V L L K 

H 

L M 

GSRS 

NYC 

P 

0 G 

I 

F T 

GTTCTTTTAAAACACC 

TGATGGGCAGCAGAAGCAATTACTGCCCAG 

ATGG 

TATT 

TTTACT 

A 90 


500 

510 

520 


530 


540 

I L A 0 

G 

P M 

L E P V 

NET 

A 

l K 

E 

S S 

ATCTTGGCAC AGGGCCCCATGCTGGAGCCAGTTATGGAGACAGCAT 

TGAA 

GG AG 

TCTTCT 

550 


560 

570 

580 


590 


600 

G L 0 T 

A 

K R 

T P I P 

A L I 

L 

S K 

G 

T 0 

GGGTTGCAAACAGCCAAGCGGACACCAATACCCGCTCTGATATTGTCGAAAGGGACCCAA 

610 


620 

630 

640 


650 


660 

A V H R 

L 

F L 

L G L R 

PAR 

Y 

C L 

R 

A F 

GCAGTCATGAGGCTATTCCTACTAGGTTTGCGCCCGGCACGGTATTGCCTCAGGGCTTTT 

6 70 


680 

690 

700 


710 


720 

M L K A 

L 

E G 

L H L L 

A 0 L 

V 

R G 

H 

N P 

ATGTTGAAGGCTCTGGAAGGTCTGCACCTGCTAGCCGATCTGGTTCGCGGTCACAATCCC 

730 


740 

750 

760 


770 


780 

V G 0 I 

H 

R A 

R S S S 

NOR 

0 

P A 

S 

T V 

GTGGGCCAAATAATGCGCGCTAGAAGCAGTTCCAACCAGCGCCAGCCTGCCTCTACTGTA 

790 


800 

810 

820 


830 


840 

K P 0 H 

A 

E E 

I A A L 

V L A 

K 

L G 

K 

D A 


AAACCTGA tatggccgaagaaattgctcctcttgttttggctaagctcggtaaagatgcc 

850 860 870 880 890 900 


888 


Downloaded from http://nar.oxfordjournals.org/ at University of California, San Francisco on December 16, 2014 




Nucleic Acids Research 


GQPKOVTKOSAKKVRQKILN 

ggccagcccaagcaagtaacgaagcaaagtgccaaaaaagtcaggcagaaaattttaaac 

910 920 930 940 950 960 

KPRQKRTPNKOCPVQQCFGK 
AAGCCTCGCCAAAAGAGGACTCCAAACAAGCAGTGCCCAGTGCAGCAGTGTTTTGGAAAG 
970 980 990 1000 1010 1020 

RGPNONFGGSEHLKLGTSDP 

agaggccccaatcagaattttggaggctctgaaatgttaaaacttggaactagtgatcca 

1030 1040 1050 1060 1070 1080 

OFP ILAELAPTVGAFFFGSK 
CAGTTCCCCATTCTTGCAGAGTTGGCTCCAACAGTTGGTGCCTTCTTCTTTGGATCTAAA 
1090 1100 1110 1120 1130 1140 

lelvkknsggaoeptkovye- 

TTAGAATTGGTCAAAAAGAATTCTGGTGGTGCTGATGAACCCACCAAAGATGTGTATGAG 
1150 1160 1170 1180 1190 1200 

loysgavrfostlpgfetim 

ctgcaatattcaggtgcagttagatttgatagtactctacctggttttgagactatcatg 

1210 1220 1230 1240 1250 1260 

KVLNENLNAYQKDGGAOVVS 
AAAGTGTTGAATGAGAATTTGAATGCCTACCAGAAGGATGGTGGTGCAGATGTGGTGAGC 
1270 1280 1290 1300 1310 1320 

PKPORKGRRQAOEKKDEVDN 
CCAAAGCCCCAAAGAAAAGGGCGTAGACAGGCTCAGGAAAAGAAAGATGAAGTAGATAAT 
1330 1340 1350 1360 1370 1380 

VSVAKPKSSVORNVSRELTP 
GTAAGCGTTGCAAAGCCCAAAAGCTCTGTGCAGCGAAATGTAAGTAGAGAATTAACCCCA 
1390 1400 1410 1420 1430 1440 

EORSLLAQILOOGVVPOGLE 
GAGGATAGAAGTCTGTTGGCTCAGATCCTAGATGATGGCGTAGTGCCAGATGGGTTAGAA 
1450 1460 1470 1480 1490 1500 

D D S N V ♦ 

GATGACTCTAATGTGTAAAGAGAATGAATCCTATGTCGGCGCTCGGTGGTAACCCTCGCG 
1510 1520 1530 1540 1550 1560 

AGAAAGTCGGGATAGGACACTtTCTATCAGAATGGATGTCTTGCTGTCATAACAGATAGA 
1570 1580 1590 1600 1610 1620 

GAAGGTTGTGGCTGCCCTGTATCAATTAGTTGAAAGAGATTGCAAAATAGAGAATGTGTG 
1630 1640 1650 1660 1670 1680 

AGAGAAGTTAGCAAGGTCCTACGTCTAACCATAAGAACGGCGATAGGCGCCCCCTGGGAA 
1690 1700 1710 1720 1730 1740 

gagctcacatcagggtactattcttgcaatgccctagtaaatgaatgaagttgatcatgg 

1750 1760 1770 1780 1790 1800 

CCAATTGGAAGAATCACAAAAAAAAAAAAAAAAAAAAAAA 
1810 1820 1830 1840 

Fig.2 Sequence of the MHV-A59 nucIeocaps I d gene ana protein. 
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components of all three oligonucleotides are ordered to maximise homology 
with the cDNA sequence (Fig.3), the predicted sequences are consistent with 
the earlier assignments of spots 19 and 19a by Lai et al. (22) and in addition 
suggest that spot 17 represents an internal region from the larger RNA's which 
is innediately upstream from the nucleocapsid gene. It is of interest that 
clone S9 diverges from the consensus sequence (Fig.l) at almost exactly the 
inferred site of divergence between oligonucleotides 19 and 17, suggesting 
that the divergence is due not to a cloning artefact, but to its originating 
from a different viral RNA. 

Sequence of the nucleocapsid gene 

The extreme 3’ gene of MHV-A59, corresponding to RNA7, is known to encode the 
viral nucleocapsid protein (7,13). Translation of the sequence (Fig.2) gives 
only one long open reading frame, from nucleotides 154 to 1515, predicting a 
protein of mol. wt. 49,660 which is enriched in basic residues. These features 
are entirely consistent with the observed electrophoretic mobility of the 
nucleocapsid protein (2) and its function of binding to viral RNA. There is 
also a short open reading frame between nucleotides 218 and 488, predicting 
a polypeptide of 90 amino acids, but there is at present no evidence for the 
existence of such a protein. 

A search for homology with the nucleocapsid gene sequences of snowshoe hare 
Bunya virus (23), Sindbis virus (24), influenza viruB (25) and vesicular sto¬ 
matitis virus (26) using the computer program SEQFIT (21) revealed no signifi¬ 
cant similarities confirming that the coronaviruses form a distinct viral 
group. 

This is the first primary structure, of either nucleic acid or protein, to be 
determined from a coronavirus. Experiments are now in progress, using restric¬ 
tion fragments prepared from the clones described above, to determine the 
precise sequence relationships between the virion and messenger RNA's, and 

Sequence No 140 150 Met 

Oligonucleotide GTAGGTAATCTAAACTTTAAGGATG 



19 


[t,t,a a a t,c]t AATCTAAACTATATG 


19a 



17 


Figure 3 

Alignment of oligonucleotides described by Lai et al. (22) with the sequence 
from Fig.2. Oligonucleotide 19 was found uniquely in RNA7, 19a uniquely in 
RNA6 and 17 in all the viral RNA's except RNA7 (6,22). 
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to done, specifically, the gene adjacent to the nucleocapsid gene, which en¬ 
codes the viral glycoprotein El (13,14). 
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